This small suite of tools is compatible with CRISPResso v.2.0.37 and has the following modes:
Download CRISPResso2_downstream source files from the CRISPResso2 Github repository.
The necessary package dependencies are best installed in a conda environment.
Use one of the crispresso_downstream_env.yml files to generate a crispresso_downstream_env environment:
For local use on Mac OSX systems:
conda env create -f crispresso_downstream_env_osx.yml
For use on non-Mac OSX systems:
conda env create -f crispresso_downstream_env_non_osx.yml
The environment contains the following dependencies:
python=2.7.15
R>=3.5.1
r-optparse
r-tidyselect
r-tidyverse
r-RColorBrewer
r-grid
r-gtable
r-extrafont
r-scales
r-effsize
r-cowplot
Once the conda environment is generated, activate it with the following command:
conda activate crispresso_downstream_env
To deactivate the conda enviroment after running the analyses, input the following:
conda deactivate crispresso_downstream_env
All commands must be input from within the CRISPResso2_downstream source directory.
-c, –crisp_out_dir: path to folder storing CRISPRessoPooled output directories
-d, –dataID: data identifier common to all names of CRISPRessoPooled input amplicons (regular expressions accepted)
-m, –mode: analysis modes [default collapse_only]:
-p, –percent_freq_cutoff: minimum frequency an allele must appear in any sample/amplicon to be included in the collapsed output table
[default 0]
–CRISPRessoBatch: for any collapse mode: Use CRISPRessoBatch files as inputs [default FALSE]
–CRISPRessoPooled: for any collapse mode: Use CRISPRessoPooled files as inputs [default FALSE]
-n, –noSub: do not include substitutions as edits in output table [default FALSE]
-f, –conversion_nuc_from: for any BE mode: the nucleotide targeted by the base editor [default C]
-t, –conversion_nuc_to: for any BE mode: the nucleotide(s) produced by the base editor. If multiple nucleotides, enter all letters without separators (ex. ATCG) [default T]
-s, –ot_sample_csv: for any OT mode: path to sample csv file (see “Input files” below for details)
-r, –ref_seq_csv: for any OT mode: path to reference and guide sequences csv file (see “Input files” below for details)
-b, –be_summary_exists: for OT_only mode: add this flag if BE summaries already exist AND set [-f, -t] if not default values) [default FALSE]
Follow the CRISPResso2 documentation to run either CRISPRessoBatch or CRISPRessoPooled.
When running CRISPRessoPooled and multiple guides are used for the same amplicon, the full names of all guides must be included and separated by underscores in pooled.txt (ex. when guides/targets 1620_OT_10 and 1620_OT_14 are on the same amplicon, name the run “1620_OT_10_1620_OT_14”).
For easy identification and sorting, it is best to name off-targets guide_OT_#### (ex. 1620_OT_12) beginning with the on-target as “0” (ex. 1620_OT_000).
Set –min_reads_to_use_region lower if low read coverage is expected for any amplicon as amplicons with fewer reads aligned will not be included in the analysis.
All input files (ref_seqs_csv and ot_sample_csv) should be in the same directory as the CRISPRessoBatch/CRISPRessoPooled run output files.
File linking CRISPRessoPooled output files to biological samples/conditions necessary for off-target analysis (not necessary for collapse or BE modes). As the off-target visualization tool takes up to 30 samples, this table may have up to 30 rows (not including headers).
| donor | condition | CRISPResso_dir_name | sample_name | R_color | R_fill | R_shape |
|---|---|---|---|---|---|---|
| Donor1_1 | mock | 20200624_DB0024_S1 | Donor1_1 mock | red | red | 1 |
| Donor1_1 | edited | 20200624_DB0024_S2 | Donor1_1 edit | red | red | 16 |
| Donor1_2 | mock | 20200624_DB0024_S3 | Donor1_2 mock | blue | blue | 1 |
| Donor1_2 | edited | 20200624_DB0024_S4 | Donor1_2 edit | blue | blue | 16 |
| Donor2_1 | mock | 20200624_DB0024_S5 | Donor2_1 mock | orange | orange | 1 |
| Donor2_1 | edited | 20200624_DB0024_S6 | Donor2_1 edit | orange | orange | 16 |
| Donor2_2 | mock | 20200624_DB0024_S7 | Donor2_2 mock | green | green | 1 |
| Donor2_2 | edited | 20200624_DB0024_S8 | Donor2_2 edit | green | green | 16 |
Necessary columns:
donor = cell donor ID + technical replicate (if applicable)
condition = either “mock” or “edited”
CRISPResso_dir_name = names of CRISPRessoPooled output directories following CRISPRessoPooled_on_
sample_name = sample names to be displayed in off-target output figure (must include donor names exactly)
Optional aesthetics columns (see options below):
R_color = R grDevices::colors color names or hex codes indicating the colors differentiating CRISPRessoPooled runs/samples in the off-target editing dotplot. If the column does not exist, default colors are generated.
R_fill = R grDevices::colors color names or hex codes, same as R_color options. By default, R_fill options are the same as R_color options.
R_shape = R ggplot2 shape options. By default, control/mock samples are unfilled circles (16), and edited samples are filled circles (1).
File containing the reference amplicon sequence, guide sequence, and PAM sequence for each off-target. The table is very similar to the CRIPSRessoPooled pooled.txt file and is necessary for off-target analysis (not necessary for collapse or BE modes).
| ot_id | amplicon_sequence | aligned_guide_seq | guide_sequence | pam |
|---|---|---|---|---|
| 1620_OT_0 | CTACAAAATTCAAATGC… | AGCATCCAGGCCCCAGGAA | AGCATC-CAGGCCCCAGGAA | TGG |
| 1620_OT_1 | GAAGATAAAATGTCTGGGC… | CTGGTCTCAGACTCCAGGAA | CTGGTCTCAGACTCCAGGAA | TGG |
| 1620_OT_2 | TCTGTCTCTCGCTTTGAA… | ACAGTCACAGGCTCCAGGAT | ACAGTCACAGGCTCCAGGAT | AAG |
| 1620_OT_3 | GTATTTGGATTCTCAGCTCA… | TTTATCACAGCCCCAGGAAA | TTTATCACAGCCCCAGGAAA | GGT |
| 1620_OT_4 | TCCTTTCAAGCAATGGTCAG… | ACACTCACAGGCTCCAGAAA | ACACTCACAGGCTCCAGAAA | GGG |
| 1620_OT_5 | TCCTTTCAAGCAATGGTCAG… | CTGGCTCACAGGCTCCAGAAA | CTGGCTCACAGGCTCCAGAAA | AGG |
Input columns:
Each off-target “guide” should be in a separate row regardless of whether they are on the same amplicon. With the exception of ot_id and guide_sequence, all sequences must be uppercase ATCG.
ot_id = off-target names (should match the names in the CRISPRessoPooled pooled.txt file exactly)
amplicon_sequence = from the CRISPRessoPooled pooled.txt file
aligned_guide_seq = 20-bp guide sequence in the CRISPRessoPooled pooled.txt file (no bulge placeholders “-” or lowercase letters)
guide_sequence = guide sequence to be displayed in off-target output figures (can contain bulge “-” placeholders and lowercase letters). The sequences in this column must be unique.
pam = PAM sequence to be displayed in off-target output figures (do not need to be cannonical PAMs)
The Collapse mode uses the Alleles_frequency_table_around_sgRNA_[guideseq].txt output by CRISPResso to summarize each indel in more readable terms (deletion lengths and locations, insertion sequences and locations, substitution conversions and locations) with the first nucleotide of the guide as base pair index 1. Only modifications within the set quantification window are counted.
Example Alleles_frequency_table_around_sgRNA_[guideseq].txt
Unlike in the CRISPResso allele tables, each row is an unique indel, not a unique read. The Collapse mode collapses all reads with the same indel. Each indel must appear at or greater than the percent_freq_cutoff set by the user in at least one sample, or it is filtered out of the table before all allele frequencies are renormalized to 100%.
collapse_only mode
Rscript crispresso_downstream_v2.0.37.R -c /Users/local_Jing_BE/1620/20200711_1620_ONESeq_rhAMPSeq_triplicates -d OT -m collapse_only -p 0 --CRISPRessoPooled
[CRISPResso_RUN_NAME]_collapsed_[percent_freq_cutoff].csv
Table of all collapsed indels across all samples in the CRISPRessoBatch or CRISPRessoPooled run generated from Alleles_frequency_table_around_sgRNA_[guideseq].txt (CRISPResso output).
Columns:
Aligned_Sequence = example Aligned_Sequence around the quantification window center for one sample (kept as an artifact of CRISPResso output)
Reference_Sequence = example Reference_Sequence around the quantification window center for one sample (kept as an artifact of CRISPResso output)
Unedited = logical indicating whether the row contains the frequency of unedited alleles
n_deleted = number of base pairs deleted in deletions overlapping the quantification window
n_inserted = number of base pairs inserted in insertions overlapping the quantification window
n_mutated = number of base pairs modified in substitutions overlapping the quantification window
indel = detailed indel (sorted alphanumerically)
*[run]_reads__[GUIDE_SEQ]* = number of reads of the indel (row) within the sample/CRISPResso run (column)
*[run]__[GUIDE_SEQ]* = percent frequency of the indel (row) within the sample/CRISPResso run (column)
The BE mode filters the collapsed allele tables by indel to only account for base editing (target and resulting nucleotides set by -f and -t parameters) in the first 3-10 bps the guide and for indels in overlapping the 17-18 bps, the Cas9 cut site. All other edits are counted as “Unedited” in the output table.
collapse_BE mode
Rscript 20200610_comLine_tester.R -c /Users/local_Jing_BE/1620/20200619_1620_ONESeq_rhAMPSeq_triplicates -p 0 -d 1620 -m collapse_BE -f C -t T --CRISPRessoPooled
BE_only mode
Rscript crispresso_downstream_v2.0.37.R -c /Users/local_Jing_BE/1620/20200619_CRISPRessoBatch_1620_OT_111 -p 0 -d Donor -m BE_only -f C -t ATG
[CRISPResso_RUN_NAME]_BE_summary_[conversion].csv
Filtered allele table. The columns are the same as those of the collapsed allele tables (listed above.)
OT mode is generated for the purpose of visualizing off-target editing, but it can be re-purposed for any pool of amplicons ran in CRISPRessoPooled.
The OT mode performs simple t-test comparing percent editing in edited v. control samples and generates composite figures summarizing off-target editing.
collapse_BE_OT mode
Rscript crispresso_downstream_v2.0.37.R -c /Users/local_Jing_BE/1620/20200619_1620_ONESeq_rhAMPSeq_triplicates -p 0 -d 1620 -m collapse_BE_OT -f C -t ATCG -s 20200619_1620_rhAMPSeq_samples.csv -r 1620_ONESeq_ref_seqs.csv --CRISPRessoPooled
collapse_OT mode
Rscript crispresso_downstream_v2.0.37.R -c /Users/IND_off_target/2020_CRISPResso2/20200710_DE_1450_rhAMPSeq -d OT -m collapse_OT -p 0 -n -s 202006_DE_1450_rhAMPSeq_samples.csv -r 202006_1450_0000_ref_seqs.csv --CRISPRessoPooled
OT_only mode (BE summary exists)
Rscript crispresso_downstream_v2.0.37.R -c /Users/local_Jing_BE/1620/20200326_1620_ONESeq_rhAMPSeq_CRISPRessoPooled -p 0 -d 1620 -m OT_only -f C -t T -s ../BE_rhAMPSeq_samples.csv -r ../1620_ONESeq_ref_seqs.csv --CRISPRessoPooled -b
OT_only mode
Rscript crispresso_downstream_v2.0.37.R -c /Users/IND_off_target/2020_CRISPResso2/20200710_DE_1450_rhAMPSeq -d OT -m OT_only -p 0 -n -s 202006_DE_1450_rhAMPSeq_samples.csv -r 202006_1450_0000_ref_seqs.csv --CRISPRessoPooled
YYYYMMDD_CRISPResso_OT_editing_summary.csv
A table displaying the total editing frequency per off-target, per sample.
Columns:
off_target = off-target names
guide_sequence = guide/off-target sequence (no PAM) including mismatches and bulges
pam = PAM sequence
sample = sample name
condition = control or edited
aligned_guide_seq = guide/off-target sequence without mismatches/bulges (the sequence used for alignment)
editing_freq = total editing frequncy per off-target, per sample
reads = number of edited reads
group = editing condition + sample name (same as in off-target composite figure below)
YYYYMMDD_CRISPResso_OTs_ttest.csv
This table displays summary statistics as well as the results of simple statistical tests comparing edited v. mock samples by off-target. Editing frequency (%) is pooled by off-target for all mock samples and all edited samples, and an independent, one-tailed t-test (95% confidence interval, Welch df) is performed to determine whether editing frequency is higher in edited samples compared to mock samples for each off-target. If fewer than 3 samples are included for either the edited or mock group, no statistical test is performed for the off-target. Likewise, if the variance is equal, no test is performed. In both cases, the p-value is NA in the output table.
In conjunction with the t.test, Cohen’s d (effect size) is also calculated using a 95% confidence interval (not pooled, Hedges correction applied).
Columns:
off_target = off-target names
edited/control_median = median value of edited/control editing frequencies (%) by off-target
edited/control_mean = mean value of edited/control editing frequencies (%) by off-target
edited/control_sd = standard deviation of edited/control editing frequencies (%) by off-target
ttest_p_value = independent, one-tailed t-test (95% confidence interval, Welch df) p-value comparing edited v. mock samples
eff_size = Cohen’s d (effect size) of difference between edited v. mock editing frequencies
significant = logical indicating whether the editing was significant for the off-target (for easy filtering)
off_target_Rplot.png/pdf
Example figure below. The heatmap on the left displays the read coverage per amplicon, per sample. The off-target names and sequences in the center are followed by an * when the editing frequency in edited samples is significantly higher than that in mock/control samples. The editing frequency is displayed in the dotplot on the right.